home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-01
/
neuron.zip
/
THEORY.DOC
< prev
next >
Wrap
Text File
|
1992-07-06
|
13KB
|
242 lines
E.FARHI neural networks THEORY.DOC 06/1992.
****************************************************************************
LET'S NOW SEE HOW IT WORKS
****************************************************************************
I don't intend to write a thesis about it, but here are some important points
in order to understand network theory.
II) AN INTRODUCTION TO NEURAL NETWORKS
**************************************
II.1) Where does it come from ?.
-------------------------------
Nature is certainly what's best on earth. So man is trying to imitate
it. We thought about copying neural systems for computing and electronic
applications because we were not very satisfied of what a usual computer can do
(just think about hand-writting recognition and space optimisation problems).
In man brain, which is composed with billions of neurons, each neuron
receives some nerve impulses from his dendrites, and emits a pulse through the
axone. Some neurons are more likely to receive signals (vision, hearing,...)
and others lead to an action (move, idea,...).
A formal neural network is just a mathematical model of this.
II.2) And now, let me introduce you to the neuron.
--------------------------------------------------
The formal neuron is a tiny little thing, that receives some inputs,
then computes its synaptic potential, and then emits its state to its friends.
I1 --(W1)--\ 'Wi' is the synaptic link (weight).
. . \ 'Ii' is the neuron input
Ii --(Wi)-->(S)--> A
. . /
In --(Wn)--/
Usually we use a linear potential:
S=Sum(Ii*Wi)+B
where 'B' is a constant for each neuron ('bias').
The output ('activity') is:
A=f(S)
where f is either a Heaviside type function or a sigmoid
type function.
^(A) ^(A)
| (+1)| _____
(+1)|-------- | .--
| |/
-------------+---------->(S) -----------+--------------->(S)
| /|
--------|(-1) ____--' |
| |(-1)
Heaviside Sigmoid
You see that there is a high (often +1) and a low (why not -1 ?) state
for the neuron. In fact the sigmoid curve is linked to a constant called
'temperature' by analogy to thermodynamic systems, and the Heaviside case
corresponds to T=0.
It is also possible to associate a probability to those curves. In that
case, the only possible states are high and low (without any middle state such
as for the sigmoid). You may have guessed (who knows ?) that it is called the
'stochastic' model.
II.3) How the net works...
--------------------------
It would not have been very interessant to stop here. So, with some
neurons, we build a network. To do that, it is just necessary to define links
between neurons.
To simplify, we group neurons into different layers. If neurons of a
single layer are linked together, it is a 'Hopfield' type layer, else it's a
'Preceptron' type layer. The final network is formed of various different
layers... __________
/ \
(1) ... (n) (1)--(2) ... (n)
/ | \ / | \ | | |
(1) ... (i) ... (n')
/ | \ / | \ / | \
Preceptron Hopfield
If we use more than one Preceptron layer, we obtain a ...
multi-preceptron network. This structure is often used for hand-writting
recognition and some expert-systems. The Hopfield structure can learn less
prototypes, and is a dynamic system, evoluting towards a stable state. It can
also be used as an auto-associative memory, but isn't very efficient (to tell
you the truth).
There are many other types of networks such as: Kohonen networks,
Boltzmann machines, auto-feedback model, etc...but, don't worry I won't talk
about them in this doc ! If you really want to know something about them, I
invite you to read some of the books in the bibliography list.
II.4) Let's go to school
------------------------
If you want to use your network, you need to teach it something nice.
There are two main algorithms used for learning. A Hopfield network can learn
directly with the Hebb rule, and a multi-layer learns better with the
back-propagation rule (an extension of the Widrow-Hoff rule).
Now, let's consider a Hopfield network. You now know that it has just a
single connected layer (say, 'n' neurons). Suppose you want to teach him to
memorize a prototype, composed of 'n' data (usually binary data, +1 or -1). In
that case a prototype is just an input. The output you want to teach to the net
IS the input. You just make:
Wij(t+1)=Wij(t)+m*Pi*Pj (Hebb rule)
where Pi,Pj are the inputs of the prototype associated to
the neurons i and j.
m>0 is a constant or, better, the boolean value (i<>j).
That procedure has to be repeated for each prototype, as long as the
network hasn't memorized it, that is to say till the prototype isn't
a stable state for the net (don't forget that such a net has its own dynamics).
For a Hopfield net, the prototypes can be memorized if they are
orthogonal together, and the net is stable if Sign(Sum(Wij*Pj))=Sign(Pi).
For a multi-layer network, it's a little more complicated. Basically, we
set the input of the net, then compute the global state, compare the output to
the one expected (we are teaching to the net an input-output prototype) then
reakon for each output neuron a local error. The 'error gradient
back-propagation' algorithm consists in calculating for each neuron the local
error from the output towards the input, and changing the synaptic links.
To apply this method, we need a sigmoid type neuron. In fact we try to
reduce the quadratic error:
QE=Sum((Ai-Oi)^2)/2
where Oi is the expected (reference) output.
Let's consider a net formed with the layers 'i','j' and 'k'.The local
error for an output neuron is:
Ek=(Ak-Ok)*f'(Sk) 'k' index is for the output layer.
where f' is the derivative of the sigmoid.
For an intermediate layer (index 'j') the error is:
Ej=Sum(Wjk*Ek)*f'(Sj)
The link Wij is then modified:
Wij=Wij-m*Ej*Ai (m>0)
NB: we have also Wjk=Wjk-a*Ek*Aj. and a similar relation for the bias B.
Typically, for a three layers net, 'k' is the output, 'i' the input an
'j' is a 'hidden' layer.
Anyway for every network, we can define some statistical variables:
E=-Sum(Wij*Ai*Aj)/2 Global network energy.
W=Sum(Wij)/N Mean synaptic link.
QW=Sum(Wij^2)/N Quadractic link.
II.5) What's up doc ? (this is the most interesting)
---------------------
All right. Now your network has learned some prototypes. We are going
to see a few properties of your net.
What a net !
The memorization is performed thanks to the synaptic links. So the
information isn't localized (such as for a logic computer). You can even
partially destroy the net without loosing the information ! (but of course if
you remove too many neurons, the output will have nothing common with what you
expect). This explains why the human brain looses daily thousands of neurons,
but you don't feel it untill you're very old !
Memorization capacity.
Anyway, don't be too hopeful, your net has a limited memorization
capacity. If you make it learn too many things, it may forget everything in a
big mess ! In fact, the number of prototypes it can memorize depends of the
number of neurons and links.
To quantify this, we define
alpha=P/C
where P is the number of prototypes and
C is the mean number of connections by neuron.
The thermodynamic study shows that for a Hopfield network, there is a
critical value: alphaC # 0.14 .
If alpha<alphaC the prototypes are memorized.
For alpha=alphaC appends a first order transition.
And if alpha>alphaC the network saturates and forgets everything.
This study has been done by analogy with a magnetic spin system.
Usually, a multi-layer network has a critical alpha between 1 and 2,
and we can avoid the net overflow limiting variations of synaptic links; the
net then forgets the oldest prototypes progressively.
So, a Hopfield net can't learn too many things, but a multi-layer one
is much more efficient.
Generalization capacity
The network knows what you haven't told it ! In fact it can predict and
extrapolate an output for an input it doesn't know. This is why you can use it
for caracter recognition. It is called an auto-associative memory.
It has been shown that the more a net knows its prototypes, the less it
can generalize, so you have to find a compromise. I usually use a 5 % to 10 %
error rate.
How does it really work ?
Basically, a Preceptron layer is a classifier. It defines a partition
of prototypes into 2^n classes limited by hyperplanes into the solutions space.
Inside the hidden layers, the net builds an internal representation of each
prototype. Studying those layers shows how the network memorizes and organizes
its information.
For a Hopfield layer, it different. The memorized prototypes are in
fact the mimima of the energy function. That's why we can use such a net for a
function optimization problems and shapes recognition.
II.6) What you can do with it
-----------------------------
Here are some particular applications of neural networks. Of course,
there are many others...
Parity recognition
We use a three precepton layer network
input 8 neurons from 0 to 255.
hidden 8 neurons
output 1 neuron -1=odd,+1=even.
You teach the right parity for some numbers (for instance 1 to 15) and
the net will extrapolate for other inputs.
Data compression
If you want to compress pictures or sound, you can use a three
precepton layer network:
input n1 neurons a bit of data
hidden n2 neurons n2<n1 -> compression
output n1 neurons input=output for learning.
Inside the hidden layer, a compression (with possibly some losses) is
performed because n2<n1. To code the picture, for instance, you just need to
save the internal representation of the hidden layer. To decode the compressed
image, you set the state of the hidden layer to the saved state.
Function optimisation
Suppose you've got a problem in which you need to find the extrema of a
function - by example, the traveling salesman problem - you can then use a
Hopfield network which energy function is the function you are studying. The
only difficulty is to find the link variations you need to impose.
E=-Sum(Wij*Ai*Aj) global energy.
Si=Sum(Wij*Aj) we assume B=0 here.
Ai=f(Si) f: Heaviside or sigmoid.
Of course, you know the Hebb rule and your expression of E, function to
optimize. You need to derivate E to find a condition on Wij. I let you the
computation...
Caracter and shape recognition.
You can use a Hopfield network, but you'd better try it with a
multi-layer network (more efficient, better capacity). You just need to teach
the net different types of digitalized caracters (for instance for a 10*14
caracter you need 140 input neurons). For the output, you associate a neuron to
each caracter (for instance 26 neurons for one alphabet). Possibly you can use
a single layer network (an input and an output).
It's even better if you process the picture before sending it to the
net (thinning, removing isolated pixels,...).
(next to read: NEURON.DOC and NETWORK.DOC)